Workflow 3: comparing observations to emissions

In addition to observation files, ancillary data can also be added to an openghg object store which can be used to perform analysis.

At the moment, the accepted files include:

  • Footprints - regional outputs from an LPDM model (e.g. NAME)

  • Emissions/Flux - estimates of species emissions within a region

  • [+Boundary conditions - to be added+]

  • Global CTM output (e.g. GEOSChem)

These inputs must adhere to an expected format and are expected to minimally contain a fixed set of inputs.

At the moment, the expected format for these files is created through standard methods from within the ACRG repository.

Loading data sources into the object store

For this tutorial we will again set up a temporary object store to store our data.

See 1_Adding_observation_data.ipynb for more details and advice on how to create a more permanent object store. Once a permanent object store is set up, these steps would only need to be performed once. Any added data can then be retrieved using searches.

import os
import tempfile

tmp_dir = tempfile.TemporaryDirectory()
os.environ["OPENGHG_PATH"] = tmp_dir.name   # temporary directory

For this, we will add observation, footprint and flux data to the object store. This data relates to Tacolneston (TAC) site within the DECC network and the area around Europe (EUROPE domain).

from openghg.client import process_obs

obs_file = "../data/DECC/tac.picarro.1minute.100m.201208.dat"

site="tac" 
network="DECC"
height="100m"
species="ch4"

process_obs(files=obs_file, data_type="CRDS", site=site, network=network, inlet=height)
  0%|          | 0/1 [00:00<?, ?it/s]
Processing: tac.picarro.1minute.100m.201208.dat:   0%|          | 0/1 [00:00<?, ?it/s]
Processing: tac.picarro.1minute.100m.201208.dat: 100%|██████████| 1/1 [00:00<00:00,  2.75it/s]
Processing: tac.picarro.1minute.100m.201208.dat: 100%|██████████| 1/1 [00:00<00:00,  2.75it/s]

{'processed': {'tac.picarro.1minute.100m.201208.dat': {'ch4': '4a5d62d5-4065-49bf-aad8-c8c7b86e72e8',
   'co2': 'dc7a36ce-064b-4e2d-8464-8e3151cbdab5'}}}
from openghg.client import process_footprint

fp_file_path = "../data/footprints/TAC-100magl_EUROPE_201208.nc"

site="tac" 
height="100m"
domain="EUROPE"
model="NAME"

process_footprint(files=fp_file_path, site=site, height=height, domain=domain, model=model)
{'tac_europe_NAME_100m': 'a571d49f-f769-4d36-bc7c-088e6ac15c70'}
from openghg.util import retrieve_example_data

flux_data = retrieve_example_data(path="flux/ch4-ukghg-all_EUROPE_2012.tar.gz")
from openghg.util import retrieve_example_data
from openghg.client import process_flux

flux_data = retrieve_example_data(path="flux/ch4-ukghg-all_EUROPE_2012.tar.gz")

site="tac" 
domain="EUROPE"
date = "2012"

source_waste = "waste"
source_energyprod = "energyprod"

flux_data_waste = [filename for filename in flux_data if source_waste in str(filename)][0]
flux_data_energyprod = [filename for filename in flux_data if source_energyprod in str(filename)][0]

process_flux(files=flux_data_waste, species=species, source=source_waste, domain=domain, date=date)
process_flux(files=flux_data_energyprod, species=species, source=source_energyprod, domain=domain, date=date)
{'ch4_energyprod_europe_2012': 'bbb3dc2e-4cfc-45f2-a26e-ffb41009a4ba'}

Creating a model scenario

With this ancillary data, we can start to make comparisons between model data, such as bottom-up inventories, and our observations. This analysis is based around a ModelScenario class which can be created to link together observation, footprint and emissions data.

Boundary conditions and other model data will be added soon

Above we loaded observation data from the Tacolneston site into the object store. We also added an associated footprint (sensitivity map) and anthropogenic emissions maps both for a domain defined over Europe.

To access and link this data we can set up our ModelScenario instance using a similiar set of keywords. In this case we have also limited ourselves to a date range:

from openghg.analyse import ModelScenario

start_date = "2012-08-01"
end_date = "2012-09-01"

scenario = ModelScenario(site=site, 
                         inlet=height, 
                         domain=domain, 
                         species=species, 
                         source=source_waste, 
                         start_date=start_date,
                         end_date=end_date)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xarray/core/indexes.py:97: FutureWarning: Indexing a timezone-naive DatetimeIndex with a timezone-aware datetime is deprecated and will raise KeyError in a future version.  Use a timezone-naive object instead.
  indexer = index.slice_indexer(
Adding obs_surface to model scenario
Updating any inputs based on observation data
site: tac, species: ch4, inlet: 100m
Adding footprint to model scenario
Adding flux to model scenario

Using these keywords, this will search the object store and attempt to collect and attach observation, footprint and flux data. This collected data will be attached to your created ModelScenario. For the observations this will be stored as the ModelScenario.obs attribute. This will be an ObsData object which contains metadata and data for your observations:

scenario.obs
ObsData(data=<xarray.Dataset>
Dimensions:                    (time: 20509)
Coordinates:
  * time                       (time) datetime64[ns] 2012-08-01T00:00:30 ... ...
Data variables:
    mf                         (time) float64 1.915e+03 1.912e+03 ... 1.942e+03
    mf_variability             (time) float64 1.431 2.018 1.767 ... 0.865 0.74
    mf_number_of_observations  (time) float64 20.0 19.0 20.0 ... 19.0 20.0 6.0
Attributes: (12/24)
    data_owner:            Simon O'Doherty
    data_owner_email:      s.odoherty@bristol.ac.uk
    inlet_height_magl:     100m
    comment:               Cavity ring-down measurements. Output from GCWerks
    long_name:             tacolneston
    site:                  tac
    ...                    ...
    sampling_period_unit:  s
    station_longitude:     1.13872
    station_latitude:      52.51775
    station_long_name:     Tacolneston Tower, UK
    station_height_masl:   50.0
    scale:                 WMO-X2004A, metadata={'site': 'tac', 'instrument': 'picarro', 'sampling_period': '60', 'inlet': '100m', 'port': '9', 'type': 'air', 'network': 'decc', 'species': 'ch4', 'calibration_scale': 'wmo-x2004a', 'long_name': 'tacolneston', 'data_owner': "Simon O'Doherty", 'data_owner_email': 's.odoherty@bristol.ac.uk', 'station_longitude': 1.13872, 'station_latitude': 52.51775, 'station_long_name': 'Tacolneston Tower, UK', 'station_height_masl': 50.0, 'inlet_height_magl': '100m', 'data_type': 'timeseries', 'comment': 'Cavity ring-down measurements. Output from GCWerks', 'conditions_of_use': 'Ensure that you contact the data owner at the outset of your project.', 'source': 'In situ measurements of air', 'Conventions': 'CF-1.8', 'file_created': '2022-03-22 10:42:54.957441+00:00', 'processed_by': 'OpenGHG_Cloud', 'sampling_period_unit': 's', 'scale': 'WMO-X2004A'})

To access the undelying xarray Dataset containing the observation data use ModelScenario.obs.data:

ds = scenario.obs.data

The ModelScenario.footprint attribute contains the linked FootprintData (again, use .data to extract xarray Dataset):

scenario.footprint
FootprintData(data=<xarray.Dataset>
Dimensions:               (time: 372, lon: 391, lat: 293, lev: 1, height: 20)
Coordinates:
  * time                  (time) datetime64[ns] 2012-08-01 ... 2012-08-31T22:...
  * lon                   (lon) float32 -97.9 -97.55 -97.2 ... 38.68 39.03 39.38
  * lat                   (lat) float32 10.73 10.96 11.2 ... 78.59 78.82 79.06
  * lev                   (lev) object 'From     0 -    40m agl'
  * height                (height) float32 500.0 1.5e+03 ... 1.85e+04 1.95e+04
Data variables:
    fp                    (lat, lon, time) float32 0.0 0.0 ... 7.629e-06
    temperature           (time) float32 15.64 14.35 14.01 ... 14.08 13.07 13.31
    pressure              (time) float32 994.9 994.3 ... 1.01e+03 1.009e+03
    wind_speed            (time) float32 8.483 8.844 8.768 ... 2.48 3.621 4.743
    wind_direction        (time) float32 193.2 203.1 201.4 ... 42.42 117.3 172.3
    PBLH                  (time) float32 510.2 341.6 291.5 ... 132.5 92.75 92.75
    release_lon           (time) float32 1.139 1.139 1.139 ... 1.139 1.139 1.139
    release_lat           (time) float32 52.52 52.52 52.52 ... 52.52 52.52 52.52
    particle_locations_n  (height, lon, time) float32 0.0 3.248e-05 ... 0.0 0.0
    particle_locations_e  (height, lat, time) float32 0.0 0.0 0.0 ... 0.0 0.0
    particle_locations_s  (height, lon, time) float32 0.0 0.0 0.0 ... 0.0 0.0
    particle_locations_w  (height, lat, time) float32 0.0 0.0 0.0 ... 0.0 0.0
Attributes: (12/16)
    author:           OpenGHG Cloud
    processed:        2022-03-22 10:42:55.132208+00:00
    data_type:        footprints
    site:             tac
    height:           100m
    domain:           europe
    ...               ...
    min_longitude:    -97.9
    max_latitude:     79.057
    min_latitude:     10.729
    time_resolution:  standard_time_resolution
    heights:          [  500.  1500.  2500.  3500.  4500.  5500.  6500.  7500...
    variables:        ['fp', 'temperature', 'pressure', 'wind_speed', 'wind_d..., metadata={'data_type': 'footprints', 'site': 'tac', 'height': '100m', 'domain': 'europe', 'model': 'name', 'start_date': '2012-08-01 00:00:00+00:00', 'end_date': '2012-08-31 22:00:00+00:00', 'max_longitude': 39.38, 'min_longitude': -97.9, 'max_latitude': 79.057, 'min_latitude': 10.729, 'time_resolution': 'standard_time_resolution', 'heights': [500.0, 1500.0, 2500.0, 3500.0, 4500.0, 5500.0, 6500.0, 7500.0, 8500.0, 9500.0, 10500.0, 11500.0, 12500.0, 13500.0, 14500.0, 15500.0, 16500.0, 17500.0, 18500.0, 19500.0], 'variables': ['fp', 'temperature', 'pressure', 'wind_speed', 'wind_direction', 'pblh', 'release_lon', 'release_lat', 'particle_locations_n', 'particle_locations_e', 'particle_locations_s', 'particle_locations_w']}, flux={}, bc={}, species='NA', scales='FIXME', units='FIXME')

And the ModelScenario.fluxes attribute can be used to access the FluxData. Note that for ModelScenario.fluxes this can contain multiple flux sources and so this is stored as a dictionary linked to the source name:

scenario.fluxes
{'waste': FluxData(data=<xarray.Dataset>
 Dimensions:  (time: 1, lat: 293, lon: 391)
 Coordinates:
   * time     (time) datetime64[ns] 2012-01-01
   * lat      (lat) float32 10.73 10.96 11.2 11.43 ... 78.36 78.59 78.82 79.06
   * lon      (lon) float32 -97.9 -97.55 -97.2 -96.84 ... 38.32 38.68 39.03 39.38
 Data variables:
     flux     (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
 Attributes: (12/17)
     Processed by:     cv18710@bristol.ac.uk
     Processed on:     2020-11-06 11:55:50.312453+00:00
     Raw file used:    /home/cv18710/work_shared/Gridded_fluxes/CH4/UKGHG/uk_f...
     author:           OpenGHG Cloud
     species:          ch4
     domain:           europe
     ...               ...
     max_longitude:    39.38
     min_longitude:    -97.9
     max_latitude:     79.057
     min_latitude:     10.729
     time_resolution:  standard
     frequency:        annual, metadata={'processed by': 'cv18710@bristol.ac.uk', 'processed on': '2020-11-06 11:55:50.312453+00:00', 'raw file used': '/home/cv18710/work_shared/gridded_fluxes/ch4/ukghg/uk_flux_waste_ch4_lonlat_0.01km_2012.nc', 'species': 'ch4', 'domain': 'europe', 'source': 'waste', 'date': '2012', 'author': 'openghg cloud', 'processed': '2022-03-22 10:43:00.404396+00:00', 'start_date': '2012-01-01 00:00:00+00:00', 'end_date': '2012-12-31 23:59:59+00:00', 'max_longitude': 39.38, 'min_longitude': -97.9, 'max_latitude': 79.057, 'min_latitude': 10.729, 'time_resolution': 'standard', 'frequency': 'annual', 'data_type': 'emissions'}, flux={}, bc={}, species='ch4', scales='FIXME', units='FIXME')}

An interactive plot for the linked observation data can be plotted using the ModelScenario.plot_timeseries() method:

scenario.plot_timeseries()

You can also set up your own searches and add this data directly.

from openghg.retrieve import get_obs_surface, get_footprint, get_flux

# Extract obs results from object store
obs_results = get_obs_surface(site=site,
                              species=species,
                              inlet=height,
                              start_date="2012-08-01",
                              end_date="2012-09-01")

# Extract footprint results from object store
footprint_results = get_footprint(site=site,
                                  domain=domain,
                                  height=height,
                                  start_date="2012-08-01",
                                  end_date="2012-09-01")

# Extract flux results from object store
flux_results = get_flux(species=species,
                        domain=domain,
                        source=source_waste)
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xarray/core/indexes.py:97: FutureWarning:

Indexing a timezone-naive DatetimeIndex with a timezone-aware datetime is deprecated and will raise KeyError in a future version.  Use a timezone-naive object instead.
scenario_direct = ModelScenario(obs=obs_results, footprint=footprint_results, flux=flux_results)
Updating any inputs based on observation data
site: tac, species: ch4, inlet: 100m

You can create your own input objects directly and add these in the same way. This allows you to bypass the object store for experimental examples. At the moment these inputs need to be ObsData, FootprintData or FluxData objects (can be created using classes from openghg.dataobjects) but simpler inputs will be made available.

One benefit of this interface is to reduce searching the database if the same data needs to be used for multiple different scenarios.

Comparing data sources

Once your ModelScenario has been created you can then start to use the linked data to compare outputs. For example we may want to calculate modelled observations at our site based on our linkec footprint and emissions data:

modelled_observations = scenario.calc_modelled_obs()
Caching calculated data

This could then be plotted directlt using the xarray plotting methods:

modelled_observations.plot()  # Can plot using xarray plotting methods
[<matplotlib.lines.Line2D at 0x7f09b4ee4e80>]
../../_images/3_Comparing_with_emissions_27_1.png

To compare the these modelled observations to the ovbservations themselves, the ModelScenario.plot_comparison() method can be used:

scenario.plot_comparison()

The ModelScenario.footprints_data_merge() method can also be used to created a combined output, with all aligned data stored directly within an xarray.Dataset:

combined_dataset = scenario.footprints_data_merge()
combined_dataset
WARNING: calc_bc keyword has not been used to create output
 BoundaryConditions have not been implemented yet.
<xarray.Dataset>
Dimensions:                    (time: 372, lon: 391, lat: 293, lev: 1,
                                height: 20)
Coordinates:
  * time                       (time) datetime64[ns] 2012-08-01T00:00:30 ... ...
  * lon                        (lon) float32 -97.9 -97.55 -97.2 ... 39.03 39.38
  * lat                        (lat) float32 10.73 10.96 11.2 ... 78.82 79.06
  * lev                        (lev) object 'From     0 -    40m agl'
  * height                     (height) float32 500.0 1.5e+03 ... 1.95e+04
Data variables: (12/16)
    mf                         (time) float64 1.922e+03 1.934e+03 ... 1.921e+03
    mf_variability             (time) float64 3.155 2.061 3.597 ... 0.237 0.5685
    mf_number_of_observations  (time) float64 19.56 19.86 19.36 ... 19.68 19.43
    fp                         (lat, lon, time) float32 0.0 0.0 ... 7.629e+03
    temperature                (time) float32 15.64 14.35 14.01 ... 13.07 13.31
    pressure                   (time) float32 994.9 994.3 ... 1.01e+03 1.009e+03
    ...                         ...
    release_lat                (time) float32 52.52 52.52 52.52 ... 52.52 52.52
    particle_locations_n       (height, lon, time) float32 0.0 3.248e-05 ... 0.0
    particle_locations_e       (height, lat, time) float32 0.0 0.0 ... 0.0 0.0
    particle_locations_s       (height, lon, time) float32 0.0 0.0 ... 0.0 0.0
    particle_locations_w       (height, lat, time) float32 0.0 0.0 ... 0.0 0.0
    mf_mod                     (time) float32 37.35 35.34 27.8 ... 14.56 24.52
Attributes: (12/40)
    author:                OpenGHG Cloud
    processed:             2022-03-22 10:42:55.132208+00:00
    data_type:             footprints
    site:                  tac
    height:                100m
    domain:                europe
    ...                    ...
    station_longitude:     1.13872
    station_latitude:      52.51775
    station_long_name:     Tacolneston Tower, UK
    station_height_masl:   50.0
    scale:                 WMO-X2004A
    resample_to:           coarsest

When the same calculation is being performed for multiple methods, the last calculation is cached to allow the outputs to be produced more efficiently. This can be disabled for large datasets by using cache=False.

For a ModelScenario object, different analyses can be performed on this linked data. For example if a daily average for the modelled observations was required, we could calculate this setting our resample_to input to "1D" (matching available pandas time aliases):

modelled_observations_12H = scenario.calc_modelled_obs(resample_to="1D")
modelled_observations_12H.plot()
Caching calculated data
[<matplotlib.lines.Line2D at 0x7f0990631910>]
../../_images/3_Comparing_with_emissions_34_2.png

To allow comparisons with multiple fluxes inputs, more than one flux source can be linked to your ModelScenario. This can be either be done upon creation or can be added using the add_flux() method. When calculating modelled observations, these flux sources will be aligned in time and stacked to create a total output:

scenario.add_flux(species=species, domain=domain, source=source_energyprod)
Adding flux to model scenario
scenario.plot_comparison()
Caching calculated data

Output for individual sources can also be created by specifying the sources as an input:

# Included recalculate option to ensure this is updated from cached data.
modelled_obs_energyprod = scenario.calc_modelled_obs(sources="energyprod", recalculate=True)
modelled_obs_energyprod.plot()
Caching calculated data
[<matplotlib.lines.Line2D at 0x7f0990634fd0>]
../../_images/3_Comparing_with_emissions_39_2.png

Plotting functions to be added for 2D / 3D data